Mixtures of Rectangles: Interpretable Soft Clustering
نویسندگان
چکیده
To be eeective, data-mining has to conclude with a succinct description of the data. To this end, we explore a clustering technique that nds dense regions in data. By constraining our model in a speciic way, we are able to represent the interesting regions as an intersection of intervals. This has the advantage of being easily read and understood by humans. Speciically, we t the data to a mixture model in which each component is a hyper-rectangle in M-dimensional space. Hyper-rectangles may overlap, meaning some points can have soft membership of several components. Each component is simply described by, for each attribute, lower and upper bounds of points in the cluster. The computational problem of nding a locally maximum-likelihood collection of k rectangles is made practical by allowing the rectangles to have soft \tails" in the early stages of an EM-like optimization scheme. Our method requires no user-supplied parameters except for the desired number of clusters. These advantages make it highly attractive for \turn-key" data-mining application. We demonstrate the usefulness of the method in subspace clustering for synthetic data, and in real-life datasets. We also show its eeective-ness in a classiication setting.
منابع مشابه
Hyper-rectangle-based Discriminative Data Generalization and Applications in Data Mining
The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as human-comprehensible patterns from which end-users can gain intuitions and insights. Axis-parallel hyper-rectangles provide interpretable generalizations for multi-dimensional data points with numerical attributes. In this dissertation, we study the fundamental problem of rectangle-ba...
متن کاملDemixing and orientational ordering in mixtures of rectangular particles.
Using scaled-particle theory for binary mixtures of two-dimensional hard particles with orientational degrees of freedom, we analyze the stability of phases with orientational order and the demixing phase behavior of a variety of mixtures. Our study is focused on cases where at least one of the components consists of hard rectangles, or a particular case of these, hard squares. A pure fluid of ...
متن کاملEffective classification of 3D image data using partitioning methods
We propose partitioning-based methods to facilitate the classification of 3-D binary image data sets of regions of interest (ROIs) with highly non-uniform distributions. The first method is based on recursive dynamic partitioning of a 3-D volume into a number of 3-D hyper-rectangles. For each hyper-rectangle, we consider, as a potential attribute, the number of voxels (volume elements) that bel...
متن کاملNew distance and similarity measures for hesitant fuzzy soft sets
The hesitant fuzzy soft set (HFSS), as a combination of hesitant fuzzy and soft sets, is regarded as a useful tool for dealing with the uncertainty and ambiguity of real-world problems. In HFSSs, each element is defined in terms of several parameters with arbitrary membership degrees. In addition, distance and similarity measures are considered as the important tools in different areas such as ...
متن کامل